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Abstract 

Recently, an extension of [independent component analysis (ICA)|fr om one to multiple datasets, termed piidependent vector] 
[analysis (IVA)[ has been the subject of significant research interest. [IVA[ has also been shown to be a generalization of Hotelling's 
canonical correlation analysis. In this paper, we provide the identification conditions for a general [rVA[ formulation, which accounts 
for linear, nonlinear, and sample-to-sample dependencies. The identification conditions are a generalization of previous results 
for [ICA[ and for [IVA[ when samples are [independently and identically distributed[ Furthermore, a principal aim of [IVA[ is the 
identification of dependent sources between datasets. Thus, we provide the additional conditions for when the arbitrary ordering 
of the sources within each dataset is common. Performance bounds in terms of the Cramer-Rao lower bound[ are also provided for 
the demixing matrices and [interference to source ratio[ The performance of two [IVA[ algorithms are compared to the theoretical 
bounds. 



I. Motivation and Introduction 



[Blind source separation (BSS)| probIems have been well studied and many algorithms have been developed and successfully 
applied in a vast array of applications [1], [2|. A generalization of the BSS problem to multiple datasets, termed joint blind 



source separation (JBSS) has been introduced recently The recent interest in JBSS is motivated by various application 

domains such as when analyzing multisubject datasets in biomedical studies using functional magnetic resonance imaging or 



electroencephalography data |f3), Q or when solving the convolutive independent component analysis (ICA) problem in the 
frequency domain using multiple frequency bins Interestingly, several algorithms developed prior to the development of 
the [BSS[ concept are capable of achieving [JBSS[ |[6|, [|7j. Thus, a much larger set of applications than the examples above are 
well treated using the JBSS formulation. 



One particular formulation of JBSS has been termed independent vector analysis (IVA) The formulation of IVA 



extension of the (linear, instantaneous) ICA model. IVA assumes a source within one dataset is dependent on at most one source 



in another dataset while sources within a dataset are mutually independent (as in ICA i. Thus, IVA reduces to performing ICA 
individually on each dataset when sources possess no dependence across datasets. Of particular interest here is to determine the 



conditions when IVA is identifiable. For a real-valued single dataset problem, independent sources can be 'blindly' identified up 



to a permutation and scaling ambiguity as long as no two sources are Gaussian with proportional sample-to-sample correlation 



matrices \^ Chapter 4]. The IVA framework has been shown to possess an additional type of diversity which can be exploited 



for identifying sources that cannot be identified by ICA |p 



In this paper, a general framework for [IVA[ is presented. By 'general' we mean an IVA formulation that accounts for 
dependency between samples, i.e., when the samples are not independently and identically distributed (iid) Prior to introducing 



this IVA formulation in Section IV, we give a review of existing IVA algorithms in Section II and define our mathematical 



conventions and notations in Section III. Naturally, IVA can be achieved by maximizing the likelihood function, which is 
shown in Section V to be the same in practice as minimizing the entropy rate (subject to a regularity term). The likelihood 
function has an associated [Fisher information matrix (FIM)| of a form that we describe in Section VI. The [FIMj is used in 
deriving the identification conditions and source separation performance bounds in Sections VII and VIII, respectively. The 



IVA identification conditions and performance bounds are generalizations of the results for ICA (of a single dataset). The IVA 



case when samples are iid is shown to have a performance bound that can be expressed compactly for the very large class of 



multivariate elliptical distributions. In Section IX, the performance bounds are compared to the performance achieved by two 



previously published algorithms for IVA In the last section, we discuss directions for future work. 



II. Review of Existing IVA Algorithms 

As mentioned previously, the origins of algorithms that can be used for [IVA[ date back to pre- [ICA[ times. In fact, classical 
canonical correlation analysis (CCA) |9 1 achieves IVA for linearly dependent sources in analysis of two datasets. The formulation 



of CCA can be shown to serve as a basis for all IVA algorithms reviewed here. This is because CCA can be derived from 



two different, but related principles; maximum likelihood and eigenanalysis (diagonalization). Here, we choose to separate the 



approaches into three classes for our review based on the source diversity exploited to achieve JBSS It will be shown that 
each type of diversity can be utilized — independent of the other two — to achieve [IVA| 
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A. Linear dependence 

The first class is applicable to problems in which the sources are assumed to have linear dependence across datasets, but are 
linearly independent within datasets. The earliest approaches to extending CCA beyond two datasets are summarized in [6| 
and has been termed multiset canonical correlation analysis (MCCA) in |4|. The approaches within MCCA use cost functions 
based on second-order statistics that result in JBSS solutions that can be widely applied. Another approach to JBSS for linearly 
dependent sources can be derived using equivalently maximum likelihood or minimization of mutual information and results 
in |IVA with multivariate Gaussian distribution model] pO) , |jTT!|. 

Since CCA can be achieved using generalized eigenvalue decomposition, it can also be posed as a diagonalization problem, 
which can be readily extended to achieve IVA using 'generalized joint diagonalization' ^2\. For IVA of linearly dependent 
sources the covariance and cross-covariance matrices among the estimated sources in each dataset can be diagonalized as in 



B. Nonlinear dependence 

When the sources possess nonlinear dependence across the datasets then higher-order statistics should be utilized either 
explicitly or implicitly. The extension of CCA to nonlinear dependence measures for two datasets dates back to at least 1976 
||7). Extensions to multiple datasets is given in |14|. These early works are summarized in |15|. 

Another extension for nonlinear CCA of two datasets uses nonparametric univariate and bivariate density estimators in order 
to maximize the mutual information between two canonical correlation variates | [T6| . Kernels have also been used to transform 
the random vectors into a 'feature-space' where linear CCA is then applied | [T7| , p8j . A different type of transformation 
is proposed in p9| . Here measure transform functions are specified for transforming joint probability measures to identify 
nonlinearly dependent sources. To use either the kernel or measure transform approaches, one must determine the appropriate 
transform and transform parameters to achieve JBSS for the problem at hand. 

2T| and in the similar 



IVA also provides a framework for exploiting nonlinear dependencies. IVA as first introduced in |[2& 

52 



work of ]22| , extends ICA to multiple datasets so as to solve the permutation ambiguity problem associated with frequency 
domain |ICA| pj]. The nonUnear dependencies can be accounted for within the IVA framework by considering non-Gaussian 
sources. For example, in | [20| , pT| , a nonlinear score function consistent with the second-order uncorrected multivariate 
Laplacian distribution is used. 



As is the case for linear dependence, diagonalization methods for IVA of nonlinearly dependent sources can be utilized. 
Specifically, demixing matrices that diagonalize the higher-order statistics (i.e., cumulants of order higher than two) associated 
with the estimated sources are found |(T2|, iTTSl 



C. Sample-to-sample dependence 



Naturally for IVA as for ICA algorithms can be developed to exploit sample-to-sample dependence. A generalization of 



joint diagonalization provides such a solution by sampling the vector autocorrelation function at different time lags and finding 
demixing matrices which minimize correlation between the sources for all time lags, see, e.g., p2) , p3| . 



III. Mathematical Preliminaries 

For this paper, the domains are restricted to the sets of real (M) and nonnegative natural (N) numbers. Matrices and vectors 
from each domain are indicated by M*^^^, M*^, l^^^><^^ and N^^, respectively. Scalar, (column) vector, and matrix quantities 
are denoted as lower-case light face, lower-case bold face, and upper-case bold face, respectively. The mth element of a vector 
V, [v]^, and an element in the mth row and n\h column of a matrix A, [A]„j ^, are often denoted u,,, and a,„,„, respectively. 

The Kronecker delta, (5m, n, is one when m — n and zero otherwise. The standard basis vector, e„, is the the nth column of 
identity matrix, Iat G M^^^. The and 1 denote matrices (or vectors) with all entries of zeros and ones, respectively, where 
the dimensions of the matrices are either known from the context or indicated by an additional subscript. 

The superscript T denotes the matrix transpose. The element-wise (Hadamard) product, element-wise division, and Kronecker 
products are denoted by A o B, A B, and A B, respectively. We use vec (A) e M^^^ = '}ln=i ^" ® (Ae„), where 
e„ € M^, to compactly denote the the stacking of the columns of A e M*^^^. Additionally, if a subset of the rows in 
A are listed in the vector a — [ai, . . . ,ad]^ G N'', where < d < M with a corresponding indexing matrix E[q,] = 
[e^j , . . . , Bq,^] e M''^^^, then E[q,] A selects the subset of rows in A indicated by ex. For compactness, we use vcCq, (A) = 
vec(E[Q,]A). The complementing subset of a is indicated by e p-jM-d ^ diagonal matrix with entries given by d is 
denoted by Diag (d) — ^n^Jid^Ji- The square matrix. A, has diagonal entries, diag (A) = X^^^Li G,ie^Ae„, a trace, 

tr (A) = [diag and a determinant, det (A). We indicate A — B is positive definite using A )^ B and positive 

semidefinite with A ^ B. The operator |-| denotes the magnitude. 

For a matrix A with block structure, the matrix A.,„ „ is the mth row and nth column in the block representation of the 
matrix A using M row partitions and N column partitions. The special block diagonal matrix is necessarily a square matrix 
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(implying M = N) that has off-diagonal partitions being zero, i.e., A,„ „ = for 1 < m ^ n < M, and is denoted with the 
direct sum notation, A = Ai^i ® A2,2 ® • • • ® ^m.m = © Y^m=i ^m.m^ | |25j . 

The common functions of random variables such as the expectation operator, entropy, and mutual information are denoted 
using -E {•}, H {•}, and I {•}, respectively. A random vector x following the normal distribution with mean and covariance 
matrix S is denoted x ^ M {{J,, S). We use x _LL y to denote that a random vector x is independent of y. We use standard 
elementary functions such as log( ), exp( ), r( ) for the natural logarithm, the anti-logarithm, and the complete Gamma 
function. 



IV. IVA Problem Formulation 



We begin by formulating the particular JBSS framework of interest, namely |rVA| in a more general manner than previously 
done |[8|, ||9), |[T3), p3) , pO| , p4) , p6| . The generalization allows analysis of |IVA| when the samples are not |iid| or alternatively 
when sample dependence is taken into account. 

There are K datasets, each containing V samples, formed from the linear mixture of N independent sources. 



^NxV 



l<k<K. 



The entry in nth row and vih column of Sl*^! is s[f' (v), the nth row of Sl''! is denoted with the column vector s[f 



[k] 



[k] 



\ and the vth column of S^'^l is denoted by the column vector s^'"'! (v) 



. The source matrices in each dataset can be concatenated to form S = 



JBSS 



notation, we can denote the 
invertible mixing matrices. At"! £ 



Jk] 



Jk] 



fNKxV 



Using this 



source component matrix (SCM) 



data model with a single equation, namely X = AS, where A = © X]fc=i 

^NxN^ and the sources S are unknown real-valued quantit ies to b e estimate d. The nth 

T ' ' ' ' 



dKxV 



is independent of all other 



distribution function (pdf) of the concatenated source vector, S, can be written as p (S) — Y[n=iPn i^n) 



SCMs 



Then the 



probability 



The IVA solution finds K demixing matrices and the corresponding source estimates for each dataset, with the fcth ones 
denoted as Wl*^! and Y^'"'! = W^'^lXl'^'l, respectively. The estimate of the nth component from the vth sample of the fcth 



dataset is given by (v) = ( wi^ ) x'*^' (v) ~ J2"=i 



.[k] 



\k] ^ \ V^JV [fc] [k] 



is assumed that the mixing matrices possess no known relationship. 



(v), where (w„ ) is the nth row of Wl*^!. Furthermore, it 



V. IVA Objective Function 



Just as in ICA the IVA objective function can be specified to be the maximization of the natural logarithm of the likelihood. 
Since A is block diagonal, the estimate of the A~^ = W = X^fcLi Wl'^l is block diagonal and thus we choose in the sequel 
to use W G ^NxNxK ^ j ^ three-dimensional 'matrix', to denote the set of parameters to be estimated. We then have that 

£(W)^log(px (X)) 

=iog|^np„(Y„)idctwr^ 

N K 

= J2 log (Pn (Y„)) + y ^ log |det V^I'^l I , (1) 

n=l k=l 

where p„ (•) is the model for the distribution characterizing the multivariate source S„. Note that if X = AS, then vec (S) = 

PS ((ly^ A-i) vcc(X)) = |dc tA-i|'^ps (S). 

as a random vector 



(Iv (g) A ^) vec (X), which implies px (X; A) — |det (ly A" 
If we consider the case when V ^ oo, then we can define the [source component vector (SCV) 



process and recall the definition of entropy rate p7| Eq 4.10] so that 



{s„} = lim 



H{s„(l) 



1 



■ , s„ (V)} = - lim -E {logp„ (S„)} . 

y — j-oo V 



(2) 



By normalizing the likelihood objective function by V and considering the limit. 



CiVA (W) 



lim ^c{yv) 



N 

E 

n=l 
N 



K 



^r.{y„}-5Ilog|detW['=l 

k=l 

H.{yW}-X.{y„} 




detWl*^! 



(3) 
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we can observe that |IVA| minimizes the entropy rate of the estimated |SCVs| (subject to the regularization term). This repre- 
sentation explains that the IVA objective function will equally weight the minimization of the source entropy rates and the 
maximization of the across dataset dependence measure provided by the mutual information rate of y„. It is also clear that the 



mutual information rate portion of the IVA objective function is responsible for resolving the permutation ambiguity across 



multiple datasets, since without the mutual information rate of the SCVs the objective function would be identical to using 

and 0W - 



ICA on each of the K datasets. This representation will be useful in our identifiability discussion in Section VII 
In the sequel, we will use the multivariate score function = (Y„) = —9 log (p„ (Y„)) /9Y„ G R 



VI. IVA Fisher Information Matrix 



the entry associated with and wlni', 7i2 denoted by and computed as: 



Here we derive the FIM of ([T]i [with respect to (wrt) W. The KN'^ parameters result in KN'^ x KN'^ dimension FIM with 

mputed as: 

dC (W) dC (W) 1 



[F(W)]^, 
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(4) 



For the purposes of determining identifiability and the performance bound, we need only consider the FIM locally around 
a solution, i.e., W — A^^, where and W are "freely" chosen as to alleviate all scale and permutation ambiguities. In 



genera l, this leads to a complex expression that depe nds on A; fortunately this complexity is unnecessary. Due to the 



mvariance 



induced Cramer-Rao lower bound (iCRLB) 



on G = WA 



wrt 



the mixing matrix A © X]/c=i need only 



of the 

consider A = I, i.e., the |Cramer-Rao lower bound (CRLB)| of G depends only on the statistics of the sources, p8) . Thus the 
matrix of interest is 



lfci,m.i,ni 

Jfe2,m2,"2 



[F(W)] 



fci,mi,ni 
k2,m2,n2 



A=I,W=I 



(5) 



It will prove useful to define ICm]n ^^ = 



1 < m,n < N, to describe the form 

of the block diagonal |FIM| compactly. In Appendix [A] we show that the first N block entries of the FIM are given by 
F„ = GOV jdiag ($„Sj, — ly) } = {K-n.n ~ V^kxr) G M^^^ and the remaining block entries are defined for n > to as 

F 



^Ky.K) 

diag f*mSj^ 
diag (*„ST 



V 



K-71 



Ik 



(6) 



where the (fci, fc2) entry of /C„i.„ G 



fKxK 



is v-^ tr r. 



[fe2,fel]Ti[fel,*:2] 



when TO ^n, B.t''''^ ^ E 



and rL'^i 



E 



5n \ ^71 



fVxV 



pVxV 



The form of the FIM is a multivariate extension of the single dataset forms given in ||2j, p9)-pT|. The FIM has a form 
that is a block matrix version of the single dataset result, e.g., see Fig. |5] and compare to the similar form given in p2| for 
complex-valued ICA The 2x2 blocks with ones in the off-diagonal elements and pair-wise cross terms in the two diagonal 



elements of the 



ICA 



FIM are here replaced with 2x2 block matrices with identity matrices in the off-diagonal blocks and 
the cross terms in the two diagonal block matrices, i.e., F„i „. 



VII. IVA Identification Conditions 



The identification of sources in (real-valued) ICA is possible so long as no two sources are Gaussian with proportional 
covariance matrices [2 Chapter 4]. When sources are said to be identifiable for |ICA[ this means that the sources can be 
recovered up to a scale factor and arbitrary ordering, i.e., the true mixing matrix Aq can be identified upto AqAP, where A 
is any nonsingular diagonal matrix and P is any permutation matrix. 



Since the the model structure of IVA is a generalization of the model structure for ICA we expect a generalization of the 



identification conditions for ICA Intuitively, the identification conditions for IVA are related to the dependence of the sources 



across the datasets. More specifically, when sources possess dependence across datasets we expect that these estimated sources 
can be 'aligned' — this is the original motivation of |IVA| | |20| , p2| . However, if there are sources for which no alignment 
exhibits dependence, then under the ICA identification conditions sources can be separated but not necessarily aligned. That 



is, without dependence across datasets the estimated sources of IVA would be no different than using ICA on each dataset 
individually since there is no dependency to exploit. The identification conditions, which we present in this section, capture 
both cases, i.e., when there is or is not dependence between sources across datasets. 

To discuss identifiability of |IVA| we need to^rovide a notation that allows us to indicate a particular subset of rows in an 



SCM For this section, we let a = \ai . . . ad^] G N °, where < Ka < K. The complementing subset of a in {1, ... , K} 
is indicated by a'^ G f^J<-J<o. 'pjjg jy^ identification conditions use the following definition: 
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Definition 1 (a-Gaussian). A source, S G M^^^, has an a-Gaussian component when veCa (S) _LL veCac (S), andveCa (S) 
TV (0, R„), where R„ = £; {vec„ (S) vecX (S)} G rK^yxk^v nonsingular 



The a-Gaussian definition is used to identify that there exist a subset of rows in an SCM that is independent of the other 



rows in the same SCM and that the given subset follows a multivariate Gaussian distribution. The theorem stating the IVA 
identification conditions and its proof follow. 

Tlieorem 1 (IVA Nonidentifiability). The sources cannot be identified \if and only if (iff) 3 a =/= 9 and 3 m =^ n such that S™ 
and S„ have a-Gaussian components for which 'Rm,a — (ly (8) D) R„ q, (Ii/ (g) D) e ^k„vxk^,v ^ where D e jj^qX-^q is 
any full rank diagonal matrix. 

Proof of rVA Nonidentifiability: Given the FIM ( |40| i, ( |4l] i, ( |42] l, since „ is a covariance matrix, it must be positive 



semidefinite and is singular iff 3 (a, b) ^ (0, 0) : a' diag ($mSjj — b^diag (#„S^) = 0, V S„i e f^s,„, S„ e r2s„, where 
57x denotes the sample space of the random matrix X. 
It is convenient to rewrite the following; 



diag (*™S^) = diag 



^0™ iv)sl{v) 



^rn i-^) ° Sn (v) ; 



(7) 



v=l 



where s„ (v) and (f)^ (v) denote the t;th columns of S„ and respectively. 

Hence, the following statements are all equivalent conditional on 3(a, b) 7^ (0,0) : 





Fm,n is singular 


(8) 




= 


a^diag (*„.ST) - b^diag ($„ST ) 


(9) 






V V 






= 


a"^ ^™ ° ®" - b"^ ^ 0„ (q) s„ (g) 

V— 1 <? — 1 


(10) 




= 


(ly (g) a)"^ (vec (*™) vec (S„)) - (ly <^ hj^ (vec (*„) vec (S™)) 


(11) 




= 


vecX (*m) (ly «) Da[a]) vec„ (S„) - vecj (*„) (ly «) T)b[f3]) vec^a (S^) 


(12) 




= 


vecX (*m) (ly Da[a]) vec„ (S„) - vec^ (*„) (ly (g) Db[a]) vec„ (S^) 


(13) 




= 


vecX (S™) R,^|„ (ly (K) Da[a]) veca (S„) - vecX (S„) R;;;_^„ (ly «) Db[„]) vecc (S.^) 


(14) 




= 


Rm!a (IV Da[„]) " (ly ^b[c.]) Rn!a 


(15) 




I^m,a — 


(ly (g) Da[„]) R„,a (ly «) ^b[c.]) 


(16) 






{Iv (g> D) R„,„ (ly (g) D) , 


(17) 



where Da[„] = Diag (a [a]), Db[„] ^ Diag (b [a]), a e N^°, and (3 G N^^. 

It is straightforward to observe that ([H), (|9]l, ( [TOj i, and ( [TT| are equivalent expressions. From the relationship (x (g) y)^ (w o z) 
(Diag (x) g) Diag (y)) z, the expression in ( [T2j i holds only when a. = (3, i.e., the zero entries of a and b are at the same 



below to explain ( [T4] i. Since ([14]) must hold for all possible values of veCa (S^) and veCo, (S„), ( [T5| ) 
is equivalent since all entries of b [a] are nonzero by (T3[ . Lastly, since Rm.a is symmetric we must 

In either case ( fTT] ) holds. □ 



locations. See Lemmajl 
must hold. Equation ( fl6 

have that either R„_q, is diagonal or Da[Q,] = (D^ 
Lemma 1. For m ^ n, 

vecX (*m) (ly 

holds 

vecl (S„)R-|„(ly 



) veCo, (S„) = vecX (*n) (ly <g Db[a]) vec„ (S™) 



Dan) vec„ (S„) = vecX (S„) R„_^„ (ly ® Db[„]) vec„ (S„) 



anc/ S„i ant/ S„ eac/; /zave an a-Gaussian component. 



(18) 



(19) 



Proof: (^) Since the left-hand side of ([TSJl is linear in veCa (S„) we must have that vcCq, ($„) is not a function of 
vecQ,c (S„) and it is necessarily linear in vcCq, (S„), i.e., S„ has a-Gaussian component. By symmetry, the same can be 
concluded about S™. 

If S„ has ex-Gaussian component then vcCq, (4>„) = R~^vecQ, (S„). □ 
It is noteworthy to mention that the IVA identification conditions admit sources for which the distribution can be factored, 
i.e., p„ (S„) = n?=iPn, (vecQ, (S„)), where {Qi, Q2, • ■ • , Qq}, Qc, C {1, ...J<), n Q,- = Vg ^ g', and \J%Qq = 
{!,..., if}. If, for example Q = K, then IVA| wou ld produce the same identification conditions as ICA on each dataset 
individually. Stated differently, identifiability of|IVA|does not require the sources to possess dependence across datasets. 
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Recalling that a prime motivation for considering the |IVA| formulation is to determine when the sources can be aligned 
in a common way across all datasets, i.e., under what conditions is AI*^! = (Wl''!) = A^'^lPA''"', where A''''' is any full 
rank diagonal matrix and P is a permutation matrix commonly shared by all datasets. The common permutation identification 
condition is given in the next theorem which uses the following definition: 

Definition 2 (a-independent). A source, S e M.^^^, is ct-independent when vec„ (S) _LL vec^c (S). 



The a-independent definition is used to identify that there exist a subset of rows in an SCM| (or |SCV I that is independent 
of the other rows in the [SCM] ( |SCV| ). 



Tlieorem 2 (Common Permutation Matrix for IVA i. Assuming the FVA identification conditions of Theorem [7] are satisfied, 
i.e., in the limit as V oo so that (W^)"^ ^ A\k]^^k] j^[k] . 

The permutation matrix associated with each dataset is common \/'m^n^cx.^% such that both s„j and s„ are 
a-independent. 

Proof: The objective function given in ([3| makes it clear that any permutation matrix at most effects the Xj. {y„} term. 
Furthermore, we only need consider permutation matrices that can achieve the global minimum. The proof is by contradiction 
(in both directions): 



3l<ki^k2<K : Pl'^'il ^ Pl''^ 



S„i and S„ are ct-independent 



(20) 
(21) 

(22) 
(23) 



We have used the fact that Ir{X]y} > with equality iff X AL y, which implies by the assumption of IVA that 
Ir IsJ"^'; s^°^' I = Vi 7^ OL2, where a.\ and 0.2 are any indexing sets. □ 

Thus, Theorem |2] provides an additional restriction on the sources (in a pairwise manner) which is required when the 
estimated dependent sources across all datasets are to be 'aligned'. 



A. Special Cases 



It is now insightful to consider important special cases of IVA with regard to the identification conditions. We begin by 

1, which implies that the identification 



considering the case when the V samples are iid This is equivalent to having V 
conditions can be derived as a special case of Theorem [T] 



Tlieorem 3 (IVA Nonidentifiability with iid Samples). The sources cannot be identified 3 a 7^ and 3 m ^ n such that 
Sm and have a-Gaussian components and R-m.a — ■DR'n,Q 

is any full rank diagonal matrix. 



Another special case of interest is when K — 1, yielding the same formulation as ICA assuming sample-to-sample 
dependence, i.e., not |iid| samples, the most general form for real-valued |1CA| 

Tlieorem 4 (ICA Nonidentifiability ||2j, |(33j). The sources cannot be identified^^3 m ^ n such that e and s„ e 
are Gaussian and R„j = (5^R„ G M^^, where 5^0. 

It can be verified that the identification conditions of Theorem |4] are consistent with the results found in ||2j Chapter 4] and 
(33). 

Another special case of interest is when K ~1, and assuming Ind] samples. 



Theorem 5 ( ICA| Nonidentifiability with iid Samples |34^). The sources cannot be identified ^\ 3 m ^ n such that s„j e 
and s„ € K are Gaussian. 



The claim of Theorem [5] originally given in |34|, states the well known result for ICA that at most one source can be 
Gaussian for identification of all Ind] sources. Algorithms based on the Ind] assumption using higher-order statistics have been 



the most widely exploited type of diversity in the derivation of ICA algorithms 



Additional diversity can extend the IVA and ICA identification conditions. An example is when data is complex-valued, a 
case we do not consider in this paper 



The 
for 0. 



CRLB 



VIII. CRLB AND ICRLB 
associated with the parameter vector © is the inverse of the 



Due to the block diagonal structure of (]40]) we have that the inverse {ii 



FIM 



i.e., GOV I®! > F ^, where is an estimator 
'it exists, see identifiability discussion in Section 
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VII I of the portion of the FIM associated with the mth and nth source denoted by Fm.n in <|6| is 



1 

V 



It yields the following CRLB on the estimates of the demixing matrix quantities. 



var 



N. 



For this JBSS formulation, the definition of the interference to source ratio (ISR) is the same as in BSS |2j, pT| , namely: 



Jfe] 



E 



E 



1 < m ^ n < N, 



(24) 



where gln.n = e^jG['"'le„ and G^'^^ = W^'^IaI'"'! is called the fcth global demixing-mixing matrix. 
The liCRLBI forllSRlis then: 

2~ 



ISR 



[fe] > le7 (1C - 



E 



V 



efc- 



E 



(25) 



Since the sources are (potentially) multivariate in the IVA formulation, it makes sense to define the ISR according to 

K 



ISR,„,„ ^ ^SJ^™!"' ^<^^n<N. 



fc=i 



After some simple manipulation, the following compact form for the |iCRLB results: 



ISR„ 



where C„ = £^|S„Sj^} e M^^^. In what follows, for notational simplicity and without loss of generality, we assume the 
sources have equal energy within each dataset, i.e., diag (C„) = diag (C„i) V 1 < to, n < A^. 
When the samples are iid then the IVA||iCRLB simplifies further if we note that: 



Els. 



[fci] _ rrl'^'i^'^^lT,, 

n \°n J ( — "n '■V j 



and for 1 < m ^ n < N, 



V 



^[ki,k2]Jki,k2] 
hn "n ' 



(26) 



(27) 



(28) 



where alf^-'^^l ^ E 



iid 



|s« [v) slJ"'' e M and 7!,^'''"' = E {v) (j>\^^^ e M are not dependent on v due to the 

assumption. 

For the |iid||IVA| discussion we simplify by replacing the |SCM| notation with |SCV| notation, i.e., we define the |SCV| s„, 
as a random vector with V realizations denoted by s„ (v) E R'^ . In addition, the multivariate score function is denoted by 

, from which we observe 



E 



[(t>m (Srn) 4>L (^m)} 



pKxK 



0„ (s,„) e E^'. For now, let R„ {s^sl} e R^""^ and T 
that K.m,n = r„ o R„ = var{0 (s,n) o s„}. 

The above gives the following |iCRLB| on the estimates of the demixing matrix entries when the samples are 



ISRm^n > — tr ^(^r„i o R„ — (r„ o R„i) 

The relationship between F and R given in the following lemma is the multivariate extension of the result given by p& 
Lemma lb of Appendix B], which has also been given in f2l, Chapter 4]. 



Lemma 2. F ^ R ^, with equality nmcj) = R ^s, i.e., s follows the Gaussian distribution. 
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Proof: The proof applies the extension of the Cauchy-Schwarz inequality for covariance matrices as given in (35). 
Specifically, T - £; {^s^} R-i£; |s0"^| > 0, with equality iff ^ = i;{0sT}R-is. By noting that £'|s</)"^| = I we 
arrive at the assertion. □ 
From this lemma, we see that a measure of non-Gaussianity (or higher-order statistics) is captured by the 'difference' between 
r and R^^. Next, we show for elliptical distributions — a broad class of source distributions — how this non-Gaussianity measure 
can be captured by a scalar quantity. 



The pdf (assuming it exists) for a zero-mean random vector following the elliptical distribution is 

CK 



p(x) 



(29) 



VdetS 

where S e M^^^ is the positive definite matrix frequently termed the dispersion matrix, hf. is some nonnegative function, 
and ck denotes the constant that makes ( p9| ) integrate to one. If the covariance matrix, E {xx^} = R, exists, then for any 
elliptical distribution it is a scalar multiple of the dispersion matrix, i.e., R = pS, where p > 0. Then the score function. 



<p (x) = —dlogp (x) /9x = g (x^S ^x) S ^x, where g (u) 



dh^ (u) 



For elliptical distributions (see Appendix B i, F = kR >2, where k ^ E {(ji^(r^)r^'+^} 
m ... . . . . ^ . . 'tt-' I. — I „ m — „ . I. — „ I I — —I 



Kr(K/2) P- appUcation of 

Lemma |2] this implies that k > 1 with equality iff Gaussiarj] Therefore, the iCRLB| for ISR| with elliptical sources is 

ISRm,n > :^tr ^(^K,„R,^^ o R„ - (k„R^^ O R„) 

For this performance bound we provide the following theorem. 



Theorem 6. If two SCVs follow distributions from the elliptical family with covariance matrices, R^ and R„, then ISRm.n 
is less than or equal to the ISRm,n associated with Gaussian \SCVs having the same covariance matrices. 

Proof: See |8| for proof that R^^ o R„ — (R^^ o R„i) ^ ^ 0. For elliptically distributed sources, via Lemma|2j we have 
that K„i > 1 and k„ > 1, thus 



(R,;i o R„ 



(yl^m^rn ° R" ^ (R-ti ^ 



R„ 



and since A ^ B, it implies x^Ax < x^Bx,Vx, and thus tr (A) < tr (B). 

A special case, which arrives at a form directly analogous to the |ICA| form, occurs when R„ 

K K„ 



R„ 



ISR„ 



> 



1 



□ 



(30) 



This expression clearly shows how for second-order uncorrelated elliptical sources, the 'degree' of non-Gaussianity as expressed 
by K, directly determines the source separation performance. In fact, as shown in the following theorem, the same statement 
holds for second-order correlated elliptical sources. 

Theorem 7. If three ^CVs\ follow distributions from the elliptical family with covariance matrices, R™ — Rm'> and R„, and 

^ ^m' then ISRra^n ^ ISRrn' ,n- 

Proof: For elliptically distributed sources, via Lemma |2] we have that > > 1 and k„ > 1, thus 

KmRm^ O Rn - K^^^ (R^,;^ ° Rm) ^ h ^rn'R^^ O Rn - (R^^^ O Rm) ^ 



Rl7 



R„ 



and since A ^ B implies x^Ax < x^Bx,Vx, and thus tr (A) < tr (B). 



□ 



A. CRLB forICA 

Another special case, which is of particular interest, is when there is only one dataset, i.e., K = 1. For this case, the 
expressions above further simplify to the more extensively studied [TCAI performance bounds Q, p9j-pT|. \i K = \, we can 
replace the SCM notation with source component notation, i.e., let s„ e be the random vector and the multivariate score 



and Try, — E 



function be denoted by 0,„ e M^. Then, for this section we have R„ ~ E |s„sj^} S 
from which we observe that for m ^ n, K.m,n = K^m.n = V~^tT (r,„R„) — F~^var |0^s„|. Also, 



{0m0m} 



V 



Fvar 



ivar|</)^s„,| 



(31) 



Under the Gaussian 



SCV 



data-model assumption, E{cj}(l}^} = E {K'^ss^R.^^} = R"^ 
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Two particular subcases in |ICA| are of interest. The first case is when the samples are [nd] with unit variance, for which 
R„ — ly, Tm — E ly, and JCm.n = where k,„ = E > 1. These simplifications give the same results as in 

||251 Eq. 38] and ||30, Thm. 2], namely: 



V V h.jji'^n ^ 



The second subcase of 
Then we have that r„, = 



ICA 



IX. Examples of Algorithm Performance and CRLB 



In this section, we compare the performance of several IVA algorithms versus the iCRLB given in Section VIII 



(32) 



is for sources with Gaussian sample-to-sample dependence, i.e., ~ A/'(0,R„ e M^^^) 
and Km,n = V'^tr (R^^R„), which corresponds to |3ll Eq. 19]. 



A. MPEIVA 

For our first set of experiments, we consider sources following the multivariate power exponential (MPE)] distribution, an 
elliptical distribution with h^. (u) — exp {^^u'^) and normalization constant ck ~ n^'^/'^2^^'^/^'^^'> (3T {K/2), where /? > 
termed the shape parameter. This distribution possesses a score function which includes the score functions used in both |36| 



and |10| as special cases. In this section, we consider IVA with multivariate power exponential distribution model (IVA-MPE) 
where the algorithm was presented in |37|, using simulated datasets with iid samples from the MPE family. The performance 



of IVA-MPE is compared with the iCRLB derived in Section VIII 



For this experiment, there are iV = 3 MPE SCVs of dimension K — All the sources use the same shape parameter, 
/?. The covariance matrix associated with each source is randomly picked for the experiment, yet fixed for all trials in the 
experiment. The fcth entry of each SCV is used as a latent source for the A;th dataset. Entries of the random mixing matrices, 
A['"'1, are from the standard normal distribution and are randomly selected for each trial. 



We compute the theoretical iCRLB for ISR and compare this value with the ISR achieved using IVA-MPE with the correct 
shape parameter for each source. We then compute the total theoretical normalized ISR defined as. 



ISR 



N 



m— l,n— l,m^n 



We compare this theoretical ISR with the average ISR computed from 1000 independent trials of the algorithm as we vary the 
number of samples per dataset, V . 



Due to the presence of local minima in the IVA objective function for non-Gaussian sources p8|, the algorithm may converge 



to local minima. At local minima, the sources are separated within a dataset but the SCVs are not successfully identified, i.e., 
the permutation ambiguity is unresolved. We first compare the |iCRLB| for the |ISR| with the mean of the |ISR| achieved over 
successful trials. A trial is deemed successful if the location of the maximum absolute entry in each row ofG^l ^ w^AW 
is unique within each dataset and colocated across the datasets (the former indicates sources are separated within each dataset 
and the latter indicates if the permutation ambiguity is resolved). The fraction of trials which are successful increases as /3 
decreases and/or as the sample size per dataset increases. The lowest success rate was 98%, when V = 100 and /? = 6. For 
all other settings the success rate was greater than 99.5%. From Fig. [T] the performance of the |IVA| algorithm approaches the 
iCRLB as the sample size per dataset increases. 



We also show in Fig. [2] — for the same experiment described above — the performance of the IVA-MPE when the algorithm 
selects between one of two shape parameters (fi £ {0.5, 2.0}) according to which shape parameter provides the lowest cost. 



In another experiment, we use the same parameters as before except now the SCVs each have identity covariance matrices. 
For this experiment, there are nonidentifiable conditions as /3 — > 1, thus we compare the iCRLB for the ISR with the median 
rather than the mean. From Fig. [3] the performance of the IVA algorithm approaches the iCRLB| as the sample size per dataset 
increases. 

In both Fig. [T] and Fig.[3j the |iCRLB follows the behavior predicted by Theorems [3] |6] and [7] Namely, the iCRLB is infinite 
when sources are Gaussian and R„ = I for all sources; the maximum |ISR| occurs when sources are Gaussian {/3 — 1); and as 



(3 moves 'away' from one the non-Gaussianity measure k increases, which yields better source separation, i.e., lower ISR 



B. Orthogonal Generalized Joint Diagonalization with Second-Order Lags 

In this section, we consider the effect of sample dependency. To the best of our knowledge, there is only one algorithm in 
the IVA framework that accounts for sample-to-sample dependence, namely [joint diagonalization via second-order statistics 



|(JDIAG-SOS)| as given in |12|. The performance of JDIAG-SOS| is compared with the iCRLB derived in Section 
All the sources are a vector moving average of |iid| Gaussian samples, i.e. 

L-l 



VIII 



(33) 
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Fig. 1. The average ISR (of the successful trials) of IVA-MPE algorithm for various numbers of iid samples versus the shape parameter of the simulated 
SCV in the iid IVA experiment. The algorithm uses exact knowledge of the shape parameter. All results are compared with the iCRLB. 
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Fig. 2. The average ISR (of the successful trials) of IVA-MPE algorithm for various numbers of iid samples versus the shape parameter of the simulated 
SCV in the iid IVA experiment. The algorithm selects from one of two shape parameters, /3 £ {0.5, 2.0}, and thus does not use exact knowledge of the 
shape parameter. All results are compared with the iCRLB. 



where z ~ M {Q,1k) and [B;]^, ^.^ ^ A/^(0, 1). For this experiment, there are TV = 3 sources for K = "i datasets, each with 
V = 1000 samples and L = 4. Entries of the random mixing matrices, A\^\ are from the standard normal distribution and are 
randomly selected for each trial. 

, 4. Since L = 4 for the data. 



We compute the theoretical iCRLB for ISR assuming the data was generated with L = 1, 



the performance bound is shown to decrease until the lag is 3. The performance bound for i = 4 is shown for lags greater than 
3. We compare the performance bounds with the average over 100 independent trials of the ISR achieved using JDIAG-SOS 



with various lags. Due to JDIAG-SOS | est imating orthogonal demixing matrices there exists a noticeable difference between 
the liCRLBI for |ISR] and the observed |ISR| 



X. Conclusion 



The use of IVA for the separation of multiple datasets concurrently has been a more recent development within the general 



BSS literature. A variety of algorithms have been developed that are essentially the multivariate extensions of ICA algorithms 
which take into account the dependence of sources between datasets in a variety of ways. There are three principal reasons 



for using these algorithms (versus just using ICA individually on each dataset). First, to increase the set of sources which 
can be identified. Second, to automatically 'align' dependent sources. Third, to maximize the achievable source separation. In 



this work, we have given the larger set of sources which can be identified by IVA proven when the estimated sources can 
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Fig. 3. The iCRLB theory for ISR as the shape parameter, /3, varies is compared with the median ISR of all 1000 trials for different numbers of iid samples, 
V. 
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-Q. — e — -e- 
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Fig. 4. The average ISR for 100 trials by JDIAG-SOS(L). The number of lags used by JDIAG-SOS is varied from to 9 (L = 1, 
shown assuming at most lag = 3. 



, 10). The iCRLB is 



be 'aligned', and provided the bound on achievable source separation using IVA These results are achieved for an IVA that 



accounts for linear and nonlinear dependence of sources across datasets, non-Gaussianity, and sample-to-sample dependence. 



It is clear that IVA bridges the gap between CCA and ICA 



It will be interesting for future work to consider the additional diversity of complex-valued sources which are improper or 
noncircular Additionally, our work will be useful for assessing the performance of future algorithms which account for sample 



dependency in an IVA framework 



Here we derive the 



FIM 



Appendix A 
Derivation of IVA FIM 

of ([T]i |wrt| VV. The KN"^ parameters result in KN"^ x KN'^ dimension 



with and Wm\\n.2 given by Q. 

For the computations to follow it is useful to observe that 

aiogldetW^W 



FIM 



YT = 



with the entry associated 



(34) 



.,(x[-i)^ 



tiKxV 
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Fig. 5. Form of FIM when N = 3 sources. All entries are of K X K matrices and we use ■ denote zero blocks. The entries of FIM associated with Fi_2, 
Fi 3, and F2,3 are indicated by blue, green, and red, respectively. 



BY, 



dw 



[k] 



(5;,„Diag(efc)X„ e 



r,KxV 



and 



d log {pm (Y„)) 



dw 



[k] 



= tr 



91og(p,„ (Y„)) dYr, 



dw 



[k] 



tr (*^,Diag (efc)X„ 



r[k] 



(35) 

(36) 

(37) 
(38) 



Note that ( [36) 1 is due to applying the chain rule given in |39 Sect. 2.8.1]. Thus the gradient of the likelihood function in ([T]l is 



dWmln 



= - 4> 



where Wm'.n is the entry in the rnth row and nth column (Wl'''!) ^ . 
Letting A = W = I we have the FIM of interest with entries given by 



I- \k2.m2^n2 I- y ' i k2 .ni2 ^n2 



A=I.W=I 



(39) 



--E 



^ '^mi ,ni ,n2 5 



where the following expression holds, E <^s 
and E |sli;'| = 0, then it is true that [F]^ 



when mi^m^ m2 = na, i.e., E \ (^l^^J ) si;V \ E 
nonzero cases to consider: 



ki ,mi ,ni 

2,m2,«2 

T 



SkiM^rn.n'lv, See |2|. Sincc, by assumption, both E |(/>|,''i'| = 
when one of the entries in {mi,ni, 1112,112) is unique. It is also zero 
sfe' ] </>l*;'J ]■ = V'^ -V^ = 0. Thus, there are only three 



V 1 /\^mi ,7ni 



I \k2,m2,n2 ^ 



T/l^[fel:*:2] 



where /c[^!a''^' = 



is the {ki,k2) entry of 

^m.rt- The form of this matrix (e.g., see Fig. [5|l is the block-matrix extension of that for the single dataset FIM given in | |32) . 



mi = n2 = m2 = ni 

mi = m2 7^ n-i = 712 
mi = n.2 7^ m2 = ni 
otherwise, 
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There exists a permuted FIM in which there are + {N — 1) /2 nonzero matrices along the diagonal, i.e.. 



The submatrices are given by 



and 



l,n=m+l-^ m,n 



F„ ^ var {diag (*„ST - ly)} ^ V (X:„,„ - VIkxk) 



Fmji = GOV 



diag (^mSl 
diag (*„ST 



= V 



K 



771, n 



(40) 



(41) 



(42) 



where F„ e R'^"^^ and F™,„ e M2a:x2K^ jj ^j^^ ^^gf^^j ^^^^ ^j^^^ for 1 < m 7^ n < we have /cl'l'n''" 



^tr (rl^-'^ilR^-'^^l), where R^''^^' ^ E js^l (sn'')^ | e K^""' and rlf-^'^l ^ e 



Appendix B 

Score Function Covariance Matrix for Elliptical Distributions 
In this appendix, we show that the score function covariance matrix, T = E ^cjxp^^, is a scalar multiple of the inverse 
of the covariance matrix for all elliptical distributions defined by (|29|. We begin by letting z = S^^''^x so that (z) = 

^} S^^/^. To compute the expectation 



det (S^'^^z) = ckK (z^z), which results in T = Jl-^/^E { 



(z^z ) zz ' 



requires the following multivariate integral to be evaluated: 

/oo 
.g^ (z'''z) ziZkp{z) dz. 
'OO 



We use a transformation of variables utilized for similar problems in ||40J, pT| , namely, 

K-l 

zi = r J]^ sin Ou 

fc=i 

r ( W sin^fe^ 2 <j<K-l 



\fe=l 

zk ~ T COS B\ 



(43) 



(44) 



(45) 



(46) 



where < 



if — 2, < Qk-\ < 27r, < r < 00. By noting that z^z = and the Jacobian of the 
9^-1 r]"^ is r^^-i sin^-2 g)^ gjj^A'-3 . , 



• Sm ^K~2 



transformation from z to [0i 

There are two cases, I — k and I ^ k, required to evaluate ( [43) . Let us consider the former first, 

E{g^z^z)z!}=E{g^r')r^+'} '^^'"^ 



K-l nK-2 I . a , 

llfe=i (smt'/;) , we have 



(47) 



KT {K/2) ' 

where we have made use of sin" ed0 = y/irT [{n + 1) /2] /T [{n + 2) /2] when n > 1. 

Now for the off-diagonal terms, e.g., when K = 2, E {g^ (^^^) ^1^2} = J^^ g^{r^)p{r)r cos 9 sin OdOdr = 0, where 
we have used the following J^^ cos (0) sin" (9) d9 ~ when n e N* . The result holds for the more general case when K > 2 

and ; 7^ fc and we arrive at the final expression of E |(/>^^| = S^^/^iJ {g"^ (z^z) zz'^} E^^^^ = i? {.g^ (z^z) zz"""} = 
^.R-i,if > 2 , where k ^ E {g\r')r^+^} j^^p. 
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